First, we will import the SPdel and all modules need. Note that you need to install before the SPdel. Then, we will use a fasta file with names in the format >genus_species_individual
import SPdel
import os
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta)
############################################################################ SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets ############################################################################ Sequences are aligned (same size) Fasta file with 116 sequences and 600 base pairs
nominal=SPdel.run_nominal(basepath,Inputs)
##################### Nominal MOTUs ##################### #####LS_brn##### LS_brn_L930, LS_brn_L931, LS_brn_L932 #####LS_con##### LS_con_L210, LS_con_L211, LS_con_L286, LS_con_L291, LS_con_L292, LS_con_L820 #####LS_elo##### LS_elo_L1041, LS_elo_L1046, LS_elo_L1047, LS_elo_L287, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L309 #####LS_gar##### LS_gar_L293, LS_gar_L294, LS_gar_L295, LS_gar_L296, LS_gar_L298 #####LS_mac##### LS_mac_B061, LS_mac_B082, LS_mac_B086, LS_mac_B087, LS_mac_B088, LS_mac_B089, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L212, LS_mac_L225, LS_mac_L290, LS_mac_L890, LS_mac_L891 #####LS_muy##### LS_muy_L907, LS_muy_L913, LS_muy_L914, LS_muy_L915 #####LS_obt##### LS_obt_B031, LS_obt_B070, LS_obt_B071, LS_obt_B074, LS_obt_B075, LS_obt_B077, LS_obt_B090, LS_obt_B100, LS_obt_B101, LS_obt_B102, LS_obt_B103, LS_obt_L004, LS_obt_L007, LS_obt_L008, LS_obt_L009, LS_obt_L013, LS_obt_L016, LS_obt_L084, LS_obt_L253, LS_obt_L266, LS_obt_L267, LS_obt_L268, LS_obt_L269, LS_obt_L282, LS_obt_L283, LS_obt_L314, LS_obt_L315, LS_obt_L316, LS_obt_L320, LS_obt_L547, LS_obt_L548 #####LS_piv##### LS_piv_B060, LS_piv_B076, LS_piv_B078, LS_piv_B093, LS_piv_B094, LS_piv_B095, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099, LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144, LS_piv_L002, LS_piv_L003, LS_piv_L005, LS_piv_L006, LS_piv_L010, LS_piv_L011, LS_piv_L014, LS_piv_L284, LS_piv_L371 #####LS_rei##### LS_rei_B072, LS_rei_B073, LS_rei_L338, LS_rei_L339, LS_rei_L341, LS_rei_L342, LS_rei_L343, LS_rei_L353, LS_rei_L355, LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779 #####LS_tri##### LS_tri_L179, LS_tri_L180, LS_tri_L182, LS_tri_L519, LS_tri_L618, LS_tri_L621, LS_tri_L690, LS_tri_L955 Using k2p distance
After calculating all genetic distances, we can obtain a summary table with the mean and maximum of intraspecific distance, the nearest deighbor (NN), and the distance to the NN (also know as minimum interspecific)
nominal.print_summary()
| Mean | Max | NN | DtoNN | |
|---|---|---|---|---|
| LS_brn | 0.000000 | 0.00000 | LS_obt | 6.77516 |
| LS_con | 2.127067 | 3.98825 | LS_obt | 5.60360 |
| LS_elo | 0.037100 | 0.16695 | LS_obt | 2.73593 |
| LS_gar | 0.000000 | 0.00000 | LS_obt | 7.68126 |
| LS_mac | 0.903538 | 1.85854 | LS_tri | 4.51779 |
| LS_muy | 7.655915 | 15.31183 | LS_tri | 7.47916 |
| LS_obt | 1.937511 | 6.71724 | LS_elo | 2.73593 |
| LS_piv | 0.266286 | 1.00758 | LS_obt | 2.90372 |
| LS_rei | 0.316828 | 0.70177 | LS_con | 6.14484 |
| LS_tri | 3.618149 | 6.33176 | LS_mac | 4.51779 |
nominal.print_summary_all()
| minimum | mean | maximum | |
|---|---|---|---|
| intra | 0.00000 | 1.68624 | 15.31183 |
| inter | 2.73593 | 9.49266 | 15.15938 |
We can plot the maximum intraspecific vs the minimum interspecific of each taxa. The taxa with minimum interspecific higher than the maximum intraspecific are below the diagonal line.
nominal.plot_max_min()
Also, we can obtain a barcoding gap plot. In case of barcoding gap there will be not superposition among intraspecific and interspecific distances
nominal.plot_freq()
If you want to explore the data, you can plot a heatmap of pairwsise genetic distance. Note the scale!
nominal.plot_heatmap()
In some dataset the range of values are so wide that the difference in the lower values could be masked, so you can use a upper threshold on your scale. In this case the existence of divergente inidividuals in LS_obt is revealed.
nominal.plot_heatmap(upper=3)
SPdel calculates genetic distances using a Kimura 2-parameters (K2p) model as default, but also you can set a p-distance model.
nominal=SPdel.run_nominal(basepath,Inputs,dis='p')
##################### Nominal MOTUs ##################### #####LS_brn##### LS_brn_L930, LS_brn_L931, LS_brn_L932 #####LS_con##### LS_con_L210, LS_con_L211, LS_con_L286, LS_con_L291, LS_con_L292, LS_con_L820 #####LS_elo##### LS_elo_L1041, LS_elo_L1046, LS_elo_L1047, LS_elo_L287, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L309 #####LS_gar##### LS_gar_L293, LS_gar_L294, LS_gar_L295, LS_gar_L296, LS_gar_L298 #####LS_mac##### LS_mac_B061, LS_mac_B082, LS_mac_B086, LS_mac_B087, LS_mac_B088, LS_mac_B089, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L212, LS_mac_L225, LS_mac_L290, LS_mac_L890, LS_mac_L891 #####LS_muy##### LS_muy_L907, LS_muy_L913, LS_muy_L914, LS_muy_L915 #####LS_obt##### LS_obt_B031, LS_obt_B070, LS_obt_B071, LS_obt_B074, LS_obt_B075, LS_obt_B077, LS_obt_B090, LS_obt_B100, LS_obt_B101, LS_obt_B102, LS_obt_B103, LS_obt_L004, LS_obt_L007, LS_obt_L008, LS_obt_L009, LS_obt_L013, LS_obt_L016, LS_obt_L084, LS_obt_L253, LS_obt_L266, LS_obt_L267, LS_obt_L268, LS_obt_L269, LS_obt_L282, LS_obt_L283, LS_obt_L314, LS_obt_L315, LS_obt_L316, LS_obt_L320, LS_obt_L547, LS_obt_L548 #####LS_piv##### LS_piv_B060, LS_piv_B076, LS_piv_B078, LS_piv_B093, LS_piv_B094, LS_piv_B095, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099, LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144, LS_piv_L002, LS_piv_L003, LS_piv_L005, LS_piv_L006, LS_piv_L010, LS_piv_L011, LS_piv_L014, LS_piv_L284, LS_piv_L371 #####LS_rei##### LS_rei_B072, LS_rei_B073, LS_rei_L338, LS_rei_L339, LS_rei_L341, LS_rei_L342, LS_rei_L343, LS_rei_L353, LS_rei_L355, LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779 #####LS_tri##### LS_tri_L179, LS_tri_L180, LS_tri_L182, LS_tri_L519, LS_tri_L618, LS_tri_L621, LS_tri_L690, LS_tri_L955 Using p-distance
nominal.print_summary()
| Mean | Max | NN | DtoNN | |
|---|---|---|---|---|
| LS_brn | 0.000000 | 0.00000 | LS_obt | 6.37066 |
| LS_con | 2.044443 | 3.83333 | LS_obt | 5.33333 |
| LS_elo | 0.037038 | 0.16667 | LS_obt | 2.66667 |
| LS_gar | 0.000000 | 0.00000 | LS_obt | 7.15503 |
| LS_mac | 0.891604 | 1.83333 | LS_tri | 4.33333 |
| LS_muy | 6.750000 | 13.50000 | LS_tri | 7.00000 |
| LS_obt | 1.872420 | 6.33333 | LS_elo | 2.66667 |
| LS_piv | 0.264893 | 1.00000 | LS_obt | 2.83333 |
| LS_rei | 0.314661 | 0.69686 | LS_con | 5.83333 |
| LS_tri | 3.428571 | 6.00000 | LS_mac | 4.33333 |
The most used DNA barcoding marker is COI, a codificant gene. So, its important for quality check to test if there is at leat one ORF without stop codons. In this case you can use CODE='VER' for mitonchondrial vertebrate or CODE='INV' for mithocondrial invertebrate o using the number code from NCBI https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c
Inputs=SPdel.reading_data(fasta,CODE='VER')
Inputs.stop_codon()
############################################################################ SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets ############################################################################ Sequences are aligned (same size) Fasta file with 116 sequences and 600 base pairs Checking stop codons using VER genetic code All sequences have at least one ORF without stop codons
C:\Users\jramirez\anaconda3\lib\site-packages\Bio\Seq.py:2334: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
Inputs=SPdel.reading_data(fasta,CODE=2)
Inputs.stop_codon()
############################################################################ SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets ############################################################################ Sequences are aligned (same size) Fasta file with 116 sequences and 600 base pairs Checking stop codons using 2 genetic code All sequences have at least one ORF without stop codons
With SPdel you can perform single-gene species delimitation methods. In this case we will calulate MOTUs using PTP method (Zhang et al., 2013) and obtain the same graphics that the nominal analysis! Note the difference in the graphics.
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
tree = './data/Megaleporinus/Megaleporinus_tree.nwk'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta,tree)
############################################################################ SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets ############################################################################ Sequences are aligned (same size) Fasta file with 116 sequences and 600 base pairs
PTP=SPdel.run_PTP(basepath,Inputs)
Speciation rate: 43.602 Coalesecnt rate: 1666.347 Null logl: 959.787 MAX logl: 1249.664 P-value: 0.000 Kolmogorov-Smirnov test for model fitting: Speciation: Dtest = 0.539 p-value >= 0.1 excellent model fitting Coalescent: Dtest = 2.220 p-value < 0.01 poor model fitting Number of species: 18 ##################### PTP MOTUs ##################### #####MOTU_01##### LS_muy_L913, LS_muy_L915, LS_muy_L914 #####MOTU_02##### LS_muy_L907 #####MOTU_03##### LS_brn_L930, LS_brn_L931, LS_brn_L932 #####MOTU_04##### LS_gar_L293, LS_gar_L298, LS_gar_L294, LS_gar_L296, LS_gar_L295 #####MOTU_05##### LS_tri_L179, LS_tri_L180, LS_tri_L182 #####MOTU_06##### LS_con_L286, LS_con_L820, LS_con_L291, LS_con_L292 #####MOTU_07##### LS_con_L210, LS_con_L211 #####MOTU_08##### LS_tri_L519, LS_tri_L690, LS_tri_L618, LS_tri_L955, LS_tri_L621 #####MOTU_09##### LS_obt_B074, LS_obt_L016, LS_obt_B077, LS_obt_B075, LS_obt_L009, LS_obt_L548, LS_obt_L004, LS_obt_L013, LS_obt_L283, LS_obt_L282, LS_obt_L547, LS_obt_L007, LS_obt_L008, LS_obt_B100, LS_obt_B103, LS_obt_B101, LS_obt_B102 #####MOTU_10##### LS_elo_L1041, LS_elo_L287, LS_elo_L309, LS_elo_L1046, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L1047 #####MOTU_11##### LS_obt_B031, LS_obt_B090, LS_obt_B070, LS_obt_L268, LS_obt_L266, LS_obt_L316, LS_obt_B071, LS_obt_L267, LS_obt_L269, LS_obt_L253, LS_obt_L314, LS_obt_L315, LS_obt_L320 #####MOTU_12##### LS_obt_L084 #####MOTU_13##### LS_mac_B061, LS_mac_B086, LS_mac_B089, LS_mac_B087, LS_mac_B088 #####MOTU_14##### LS_mac_B082, LS_mac_L890, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L225, LS_mac_L212, LS_mac_L891, LS_mac_L290 #####MOTU_15##### LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779 #####MOTU_16##### LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353 #####MOTU_17##### LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144 #####MOTU_18##### LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099 Using k2p distance
PTP.print_summary()
| Mean | Max | NN | DtoNN | |
|---|---|---|---|---|
| MOTU_01 | 0.000000 | 0.00000 | MOTU_03 | 11.60407 |
| MOTU_02 | NaN | NaN | MOTU_08 | 7.47916 |
| MOTU_03 | 0.000000 | 0.00000 | MOTU_09 | 6.77516 |
| MOTU_04 | 0.000000 | 0.00000 | MOTU_09 | 7.68126 |
| MOTU_05 | 0.000000 | 0.00000 | MOTU_08 | 6.33176 |
| MOTU_06 | 0.000000 | 0.00000 | MOTU_07 | 3.98825 |
| MOTU_07 | 0.000000 | 0.00000 | MOTU_06 | 3.98825 |
| MOTU_08 | 0.000000 | 0.00000 | MOTU_14 | 4.51779 |
| MOTU_09 | 0.143081 | 0.50167 | MOTU_11 | 2.84291 |
| MOTU_10 | 0.037100 | 0.16695 | MOTU_11 | 2.73593 |
| MOTU_11 | 0.000000 | 0.00000 | MOTU_10 | 2.73593 |
| MOTU_12 | NaN | NaN | MOTU_17 | 2.90372 |
| MOTU_13 | 0.136404 | 0.34101 | MOTU_14 | 1.55005 |
| MOTU_14 | 0.000000 | 0.00000 | MOTU_13 | 1.55005 |
| MOTU_15 | 0.000000 | 0.00000 | MOTU_16 | 0.67115 |
| MOTU_16 | 0.000000 | 0.00000 | MOTU_15 | 0.67115 |
| MOTU_17 | 0.083475 | 0.16695 | MOTU_18 | 0.67024 |
| MOTU_18 | 0.061102 | 0.16772 | MOTU_17 | 0.67024 |
PTP.plot_max_min()
PTP.plot_freq()
PTP.plot_heatmap()
Also, you can calculate the PTP using the bPTP web server (https://species.h-its.org/) and use the output file.
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
PTP_out= './data/Megaleporinus/PTP.PTPhSupportPartition.txt'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta)
############################################################################ SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets ############################################################################ Sequences are aligned (same size) Fasta file with 116 sequences and 600 base pairs
PTP=SPdel.run_PTPList(basepath,Inputs,PTP_out)
##################### PTP MOTUs ##################### #####MOTU_01##### LS_muy_L913, LS_muy_L915, LS_muy_L914 #####MOTU_02##### LS_muy_L907 #####MOTU_03##### LS_brn_L930, LS_brn_L931, LS_brn_L932 #####MOTU_04##### LS_gar_L293, LS_gar_L298, LS_gar_L294, LS_gar_L296, LS_gar_L295 #####MOTU_05##### LS_tri_L179, LS_tri_L180, LS_tri_L182 #####MOTU_06##### LS_con_L286, LS_con_L820, LS_con_L291, LS_con_L292 #####MOTU_07##### LS_con_L210, LS_con_L211 #####MOTU_08##### LS_tri_L519, LS_tri_L690, LS_tri_L618, LS_tri_L955, LS_tri_L621 #####MOTU_09##### LS_obt_B074, LS_obt_L016, LS_obt_B077, LS_obt_B075, LS_obt_L009, LS_obt_L548, LS_obt_L004, LS_obt_L013, LS_obt_L283, LS_obt_L282, LS_obt_L547, LS_obt_L007, LS_obt_L008, LS_obt_B100, LS_obt_B103, LS_obt_B101, LS_obt_B102 #####MOTU_10##### LS_elo_L1041, LS_elo_L287, LS_elo_L309, LS_elo_L1046, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L1047 #####MOTU_11##### LS_obt_B031, LS_obt_B090, LS_obt_B070, LS_obt_L268, LS_obt_L266, LS_obt_L316, LS_obt_B071, LS_obt_L267, LS_obt_L269, LS_obt_L253, LS_obt_L314, LS_obt_L315, LS_obt_L320 #####MOTU_12##### LS_obt_L084 #####MOTU_13##### LS_mac_B061, LS_mac_B086, LS_mac_B089, LS_mac_B087, LS_mac_B088 #####MOTU_14##### LS_mac_B082, LS_mac_L890, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L225, LS_mac_L212, LS_mac_L891, LS_mac_L290 #####MOTU_15##### LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779 #####MOTU_16##### LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353 #####MOTU_17##### LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144 #####MOTU_18##### LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099 Using k2p distance
PTP.print_summary()
| Mean | Max | NN | DtoNN | |
|---|---|---|---|---|
| MOTU_01 | 0.000000 | 0.00000 | MOTU_03 | 11.60407 |
| MOTU_02 | NaN | NaN | MOTU_08 | 7.47916 |
| MOTU_03 | 0.000000 | 0.00000 | MOTU_09 | 6.77516 |
| MOTU_04 | 0.000000 | 0.00000 | MOTU_09 | 7.68126 |
| MOTU_05 | 0.000000 | 0.00000 | MOTU_08 | 6.33176 |
| MOTU_06 | 0.000000 | 0.00000 | MOTU_07 | 3.98825 |
| MOTU_07 | 0.000000 | 0.00000 | MOTU_06 | 3.98825 |
| MOTU_08 | 0.000000 | 0.00000 | MOTU_14 | 4.51779 |
| MOTU_09 | 0.143081 | 0.50167 | MOTU_11 | 2.84291 |
| MOTU_10 | 0.037100 | 0.16695 | MOTU_11 | 2.73593 |
| MOTU_11 | 0.000000 | 0.00000 | MOTU_10 | 2.73593 |
| MOTU_12 | NaN | NaN | MOTU_17 | 2.90372 |
| MOTU_13 | 0.136404 | 0.34101 | MOTU_14 | 1.55005 |
| MOTU_14 | 0.000000 | 0.00000 | MOTU_13 | 1.55005 |
| MOTU_15 | 0.000000 | 0.00000 | MOTU_16 | 0.67115 |
| MOTU_16 | 0.000000 | 0.00000 | MOTU_15 | 0.67115 |
| MOTU_17 | 0.083475 | 0.16695 | MOTU_18 | 0.67024 |
| MOTU_18 | 0.061102 | 0.16772 | MOTU_17 | 0.67024 |
PTP.plot_max_min()
PTP.plot_freq()
PTP.plot_heatmap()
In this case we will calulate MOTUs using bPTP species delimitaiton method (Zhang et al., 2013) and obtain the same graphics that the nominal analysis! Note the difference in the graphics.
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
tree = './data/Megaleporinus/Megaleporinus_tree.nwk'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta,tree)
############################################################################ SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets ############################################################################ Sequences are aligned (same size) Fasta file with 116 sequences and 600 base pairs
bPTP=SPdel.run_bPTP(basepath,Inputs)
Estimated number of species is between 17 and 21 Mean: 18.56 bPTP finished running with the following parameters: MCMC iterations:................10000 MCMC sampling interval:.........100 MCMC burn-in:...................0.10 MCMC seed:......................1234 ##################### bPTP MOTUs ##################### #####MOTU_01##### LS_muy_L913, LS_muy_L915, LS_muy_L914 #####MOTU_02##### LS_muy_L907 #####MOTU_03##### LS_brn_L930, LS_brn_L931, LS_brn_L932 #####MOTU_04##### LS_gar_L293, LS_gar_L298, LS_gar_L294, LS_gar_L296, LS_gar_L295 #####MOTU_05##### LS_tri_L179, LS_tri_L180, LS_tri_L182 #####MOTU_06##### LS_con_L286, LS_con_L820, LS_con_L291, LS_con_L292 #####MOTU_07##### LS_con_L210, LS_con_L211 #####MOTU_08##### LS_tri_L519, LS_tri_L690, LS_tri_L618, LS_tri_L955, LS_tri_L621 #####MOTU_09##### LS_obt_B074, LS_obt_L016, LS_obt_B077, LS_obt_B075, LS_obt_L009, LS_obt_L548, LS_obt_L004, LS_obt_L013, LS_obt_L283, LS_obt_L282, LS_obt_L547, LS_obt_L007, LS_obt_L008, LS_obt_B100, LS_obt_B103, LS_obt_B101, LS_obt_B102 #####MOTU_10##### LS_elo_L1041, LS_elo_L287, LS_elo_L309, LS_elo_L1046, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L1047 #####MOTU_11##### LS_obt_B031, LS_obt_B090, LS_obt_B070, LS_obt_L268, LS_obt_L266, LS_obt_L316, LS_obt_B071, LS_obt_L267, LS_obt_L269, LS_obt_L253, LS_obt_L314, LS_obt_L315, LS_obt_L320 #####MOTU_12##### LS_obt_L084 #####MOTU_13##### LS_mac_B061, LS_mac_B086, LS_mac_B089, LS_mac_B087, LS_mac_B088 #####MOTU_14##### LS_mac_B082, LS_mac_L890, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L225, LS_mac_L212, LS_mac_L891, LS_mac_L290 #####MOTU_15##### LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779 #####MOTU_16##### LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353 #####MOTU_17##### LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144 #####MOTU_18##### LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099 Using k2p distance
<Figure size 432x288 with 0 Axes>
bPTP.print_summary()
| Mean | Max | NN | DtoNN | |
|---|---|---|---|---|
| MOTU_01 | 0.000000 | 0.00000 | MOTU_03 | 11.60407 |
| MOTU_02 | NaN | NaN | MOTU_08 | 7.47916 |
| MOTU_03 | 0.000000 | 0.00000 | MOTU_09 | 6.77516 |
| MOTU_04 | 0.000000 | 0.00000 | MOTU_09 | 7.68126 |
| MOTU_05 | 0.000000 | 0.00000 | MOTU_08 | 6.33176 |
| MOTU_06 | 0.000000 | 0.00000 | MOTU_07 | 3.98825 |
| MOTU_07 | 0.000000 | 0.00000 | MOTU_06 | 3.98825 |
| MOTU_08 | 0.000000 | 0.00000 | MOTU_14 | 4.51779 |
| MOTU_09 | 0.143081 | 0.50167 | MOTU_11 | 2.84291 |
| MOTU_10 | 0.037100 | 0.16695 | MOTU_11 | 2.73593 |
| MOTU_11 | 0.000000 | 0.00000 | MOTU_10 | 2.73593 |
| MOTU_12 | NaN | NaN | MOTU_17 | 2.90372 |
| MOTU_13 | 0.136404 | 0.34101 | MOTU_14 | 1.55005 |
| MOTU_14 | 0.000000 | 0.00000 | MOTU_13 | 1.55005 |
| MOTU_15 | 0.000000 | 0.00000 | MOTU_16 | 0.67115 |
| MOTU_16 | 0.000000 | 0.00000 | MOTU_15 | 0.67115 |
| MOTU_17 | 0.083475 | 0.16695 | MOTU_18 | 0.67024 |
| MOTU_18 | 0.061102 | 0.16772 | MOTU_17 | 0.67024 |
bPTP.plot_max_min()
bPTP.plot_freq()
bPTP.plot_heatmap()
As bPTP is a bayesian implementation, sometime you would need more iteration. You can change the default niter='10000', sample='100', burnin='0.1' as your convenience.
niter='100000'
sample='1000'
burnin='0.1'
bPTP=SPdel.run_bPTP(basepath,Inputs,niter,sample,burnin)
Estimated number of species is between 17 and 21 Mean: 18.29 bPTP finished running with the following parameters: MCMC iterations:................100000 MCMC sampling interval:.........1000 MCMC burn-in:...................0.10 MCMC seed:......................1234 ##################### bPTP MOTUs ##################### #####MOTU_01##### LS_muy_L913, LS_muy_L915, LS_muy_L914 #####MOTU_02##### LS_muy_L907 #####MOTU_03##### LS_brn_L930, LS_brn_L931, LS_brn_L932 #####MOTU_04##### LS_gar_L293, LS_gar_L298, LS_gar_L294, LS_gar_L296, LS_gar_L295 #####MOTU_05##### LS_tri_L179, LS_tri_L180, LS_tri_L182 #####MOTU_06##### LS_con_L286, LS_con_L820, LS_con_L291, LS_con_L292 #####MOTU_07##### LS_con_L210, LS_con_L211 #####MOTU_08##### LS_tri_L519, LS_tri_L690, LS_tri_L618, LS_tri_L955, LS_tri_L621 #####MOTU_09##### LS_obt_B074, LS_obt_L016, LS_obt_B077, LS_obt_B075, LS_obt_L009, LS_obt_L548, LS_obt_L004, LS_obt_L013, LS_obt_L283, LS_obt_L282, LS_obt_L547, LS_obt_L007, LS_obt_L008, LS_obt_B100, LS_obt_B103, LS_obt_B101, LS_obt_B102 #####MOTU_10##### LS_elo_L1041, LS_elo_L287, LS_elo_L309, LS_elo_L1046, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L1047 #####MOTU_11##### LS_obt_B031, LS_obt_B090, LS_obt_B070, LS_obt_L268, LS_obt_L266, LS_obt_L316, LS_obt_B071, LS_obt_L267, LS_obt_L269, LS_obt_L253, LS_obt_L314, LS_obt_L315, LS_obt_L320 #####MOTU_12##### LS_obt_L084 #####MOTU_13##### LS_mac_B061, LS_mac_B086, LS_mac_B089, LS_mac_B087, LS_mac_B088 #####MOTU_14##### LS_mac_B082, LS_mac_L890, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L225, LS_mac_L212, LS_mac_L891, LS_mac_L290 #####MOTU_15##### LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779 #####MOTU_16##### LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353 #####MOTU_17##### LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144 #####MOTU_18##### LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099 Using k2p distance
<Figure size 432x288 with 0 Axes>
Also, you can calculate the bPTP using the bPTP web server (https://species.h-its.org/) and use the output file.
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
bPTP_out= './data/Megaleporinus/bPTP.PTPhSupportPartition.txt'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta)
############################################################################ SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets ############################################################################ Sequences are aligned (same size) Fasta file with 116 sequences and 600 base pairs
bPTP=SPdel.run_bPTPList(basepath,Inputs,bPTP_out)
##################### bPTP MOTUs ##################### #####MOTU_01##### LS_muy_L913, LS_muy_L915, LS_muy_L914 #####MOTU_02##### LS_muy_L907 #####MOTU_03##### LS_brn_L930, LS_brn_L931, LS_brn_L932 #####MOTU_04##### LS_gar_L293, LS_gar_L298, LS_gar_L294, LS_gar_L296, LS_gar_L295 #####MOTU_05##### LS_tri_L179, LS_tri_L180, LS_tri_L182 #####MOTU_06##### LS_con_L286, LS_con_L820, LS_con_L291, LS_con_L292 #####MOTU_07##### LS_con_L210, LS_con_L211 #####MOTU_08##### LS_tri_L519, LS_tri_L690, LS_tri_L618, LS_tri_L955, LS_tri_L621 #####MOTU_09##### LS_obt_B074, LS_obt_L016, LS_obt_B077, LS_obt_B075, LS_obt_L009, LS_obt_L548, LS_obt_L004, LS_obt_L013, LS_obt_L283, LS_obt_L282, LS_obt_L547, LS_obt_L007, LS_obt_L008, LS_obt_B100, LS_obt_B103, LS_obt_B101, LS_obt_B102 #####MOTU_10##### LS_elo_L1041, LS_elo_L287, LS_elo_L309, LS_elo_L1046, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L1047 #####MOTU_11##### LS_obt_B031, LS_obt_B090, LS_obt_B070, LS_obt_L268, LS_obt_L266, LS_obt_L316, LS_obt_B071, LS_obt_L267, LS_obt_L269, LS_obt_L253, LS_obt_L314, LS_obt_L315, LS_obt_L320 #####MOTU_12##### LS_obt_L084 #####MOTU_13##### LS_mac_B061, LS_mac_B086, LS_mac_B089, LS_mac_B087, LS_mac_B088 #####MOTU_14##### LS_mac_B082, LS_mac_L890, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L225, LS_mac_L212, LS_mac_L891, LS_mac_L290 #####MOTU_15##### LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779 #####MOTU_16##### LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353 #####MOTU_17##### LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144 #####MOTU_18##### LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099 Using k2p distance
bPTP.print_summary()
| Mean | Max | NN | DtoNN | |
|---|---|---|---|---|
| MOTU_01 | 0.000000 | 0.00000 | MOTU_03 | 11.60407 |
| MOTU_02 | NaN | NaN | MOTU_08 | 7.47916 |
| MOTU_03 | 0.000000 | 0.00000 | MOTU_09 | 6.77516 |
| MOTU_04 | 0.000000 | 0.00000 | MOTU_09 | 7.68126 |
| MOTU_05 | 0.000000 | 0.00000 | MOTU_08 | 6.33176 |
| MOTU_06 | 0.000000 | 0.00000 | MOTU_07 | 3.98825 |
| MOTU_07 | 0.000000 | 0.00000 | MOTU_06 | 3.98825 |
| MOTU_08 | 0.000000 | 0.00000 | MOTU_14 | 4.51779 |
| MOTU_09 | 0.143081 | 0.50167 | MOTU_11 | 2.84291 |
| MOTU_10 | 0.037100 | 0.16695 | MOTU_11 | 2.73593 |
| MOTU_11 | 0.000000 | 0.00000 | MOTU_10 | 2.73593 |
| MOTU_12 | NaN | NaN | MOTU_17 | 2.90372 |
| MOTU_13 | 0.136404 | 0.34101 | MOTU_14 | 1.55005 |
| MOTU_14 | 0.000000 | 0.00000 | MOTU_13 | 1.55005 |
| MOTU_15 | 0.000000 | 0.00000 | MOTU_16 | 0.67115 |
| MOTU_16 | 0.000000 | 0.00000 | MOTU_15 | 0.67115 |
| MOTU_17 | 0.083475 | 0.16695 | MOTU_18 | 0.67024 |
| MOTU_18 | 0.061102 | 0.16772 | MOTU_17 | 0.67024 |
bPTP.plot_max_min()
bPTP.plot_freq()
bPTP.plot_heatmap()
Another method included in SPdel is the GMYC (Pons, et al. 2006). Of course, you can obtain all same metrics and figures as previous methods
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
tree = './data/Megaleporinus/Megaleporinus_tree.nwk'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta,tree)
############################################################################ SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets ############################################################################ Sequences are aligned (same size) Fasta file with 116 sequences and 600 base pairs
GMYC=SPdel.run_GMYC(basepath,Inputs)
Highest llh:1004.3909296154683 Num spe:18 Null llh:968.4341056152681 P-value:2.220446049250313e-16 Final number of estimated species by GMYC: 18 ##################### GMYC MOTUs ##################### #####MOTU_01##### LS_obt_B074, LS_obt_L016, LS_obt_B077, LS_obt_B075, LS_obt_L009, LS_obt_L548, LS_obt_L004, LS_obt_L013, LS_obt_L283, LS_obt_L282, LS_obt_L547, LS_obt_L007, LS_obt_L008, LS_obt_B100, LS_obt_B103, LS_obt_B101, LS_obt_B102 #####MOTU_02##### LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099 #####MOTU_03##### LS_mac_B082, LS_mac_L890, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L225, LS_mac_L212, LS_mac_L891, LS_mac_L290 #####MOTU_04##### LS_obt_B031, LS_obt_B090, LS_obt_B070, LS_obt_L268, LS_obt_L266, LS_obt_L316, LS_obt_B071, LS_obt_L267, LS_obt_L269, LS_obt_L253, LS_obt_L314, LS_obt_L315, LS_obt_L320 #####MOTU_05##### LS_mac_B061, LS_mac_B086, LS_mac_B089, LS_mac_B087, LS_mac_B088 #####MOTU_06##### LS_elo_L1041, LS_elo_L287, LS_elo_L309, LS_elo_L1046, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L1047 #####MOTU_07##### LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353 #####MOTU_08##### LS_tri_L519, LS_tri_L690, LS_tri_L618, LS_tri_L955, LS_tri_L621 #####MOTU_09##### LS_gar_L293, LS_gar_L298, LS_gar_L294, LS_gar_L296, LS_gar_L295 #####MOTU_10##### LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144 #####MOTU_11##### LS_con_L286, LS_con_L820, LS_con_L291, LS_con_L292 #####MOTU_12##### LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779 #####MOTU_13##### LS_tri_L179, LS_tri_L180, LS_tri_L182 #####MOTU_14##### LS_muy_L913, LS_muy_L915, LS_muy_L914 #####MOTU_15##### LS_brn_L930, LS_brn_L931, LS_brn_L932 #####MOTU_16##### LS_con_L210, LS_con_L211 #####MOTU_17##### LS_obt_L084 #####MOTU_18##### LS_muy_L907 Using k2p distance
GMYC.print_summary()
| Mean | Max | NN | DtoNN | |
|---|---|---|---|---|
| MOTU_01 | 0.143081 | 0.50167 | MOTU_04 | 2.84291 |
| MOTU_02 | 0.058574 | 0.16772 | MOTU_10 | 0.67024 |
| MOTU_03 | 0.000000 | 0.00000 | MOTU_05 | 1.55005 |
| MOTU_04 | 0.000000 | 0.00000 | MOTU_06 | 2.73593 |
| MOTU_05 | 0.136404 | 0.34101 | MOTU_03 | 1.55005 |
| MOTU_06 | 0.037100 | 0.16695 | MOTU_04 | 2.73593 |
| MOTU_07 | 0.000000 | 0.00000 | MOTU_12 | 0.67115 |
| MOTU_08 | 0.000000 | 0.00000 | MOTU_03 | 4.51779 |
| MOTU_09 | 0.000000 | 0.00000 | MOTU_01 | 7.68126 |
| MOTU_10 | 0.083475 | 0.16695 | MOTU_02 | 0.67024 |
| MOTU_11 | 0.000000 | 0.00000 | MOTU_16 | 3.98825 |
| MOTU_12 | 0.000000 | 0.00000 | MOTU_07 | 0.67115 |
| MOTU_13 | 0.000000 | 0.00000 | MOTU_08 | 6.33176 |
| MOTU_14 | 0.000000 | 0.00000 | MOTU_15 | 11.60407 |
| MOTU_15 | 0.000000 | 0.00000 | MOTU_01 | 6.77516 |
| MOTU_16 | 0.000000 | 0.00000 | MOTU_11 | 3.98825 |
| MOTU_17 | NaN | NaN | MOTU_10 | 2.90372 |
| MOTU_18 | NaN | NaN | NaN |
GMYC.plot_max_min()
GMYC.plot_freq()
GMYC.plot_heatmap()
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
GMYC_out= './data/Megaleporinus/GMYC_MOTU.txt'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta)
############################################################################ SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets ############################################################################ Sequences are aligned (same size) Fasta file with 116 sequences and 600 base pairs
GMYC=SPdel.run_GMYCList(basepath,Inputs,GMYC_out)
##################### GMYC MOTUs ##################### #####MOTU_01##### LS_obt_B074, LS_obt_L016, LS_obt_B077, LS_obt_B075, LS_obt_L009, LS_obt_L548, LS_obt_L004, LS_obt_L013, LS_obt_L283, LS_obt_L282, LS_obt_L547, LS_obt_L007, LS_obt_L008, LS_obt_B100, LS_obt_B103, LS_obt_B101, LS_obt_B102 #####MOTU_02##### LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099 #####MOTU_03##### LS_mac_B082, LS_mac_L890, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L225, LS_mac_L212, LS_mac_L891, LS_mac_L290 #####MOTU_04##### LS_obt_B031, LS_obt_B090, LS_obt_B070, LS_obt_L268, LS_obt_L266, LS_obt_L316, LS_obt_B071, LS_obt_L267, LS_obt_L269, LS_obt_L253, LS_obt_L314, LS_obt_L315, LS_obt_L320 #####MOTU_05##### LS_mac_B061, LS_mac_B086, LS_mac_B089, LS_mac_B087, LS_mac_B088 #####MOTU_06##### LS_elo_L1041, LS_elo_L287, LS_elo_L309, LS_elo_L1046, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L1047 #####MOTU_07##### LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353 #####MOTU_08##### LS_tri_L519, LS_tri_L690, LS_tri_L618, LS_tri_L955, LS_tri_L621 #####MOTU_09##### LS_gar_L293, LS_gar_L298, LS_gar_L294, LS_gar_L296, LS_gar_L295 #####MOTU_10##### LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144 #####MOTU_11##### LS_con_L286, LS_con_L820, LS_con_L291, LS_con_L292 #####MOTU_12##### LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779 #####MOTU_13##### LS_tri_L179, LS_tri_L180, LS_tri_L182 #####MOTU_14##### LS_muy_L913, LS_muy_L915, LS_muy_L914 #####MOTU_15##### LS_brn_L930, LS_brn_L931, LS_brn_L932 #####MOTU_16##### LS_con_L210, LS_con_L211 #####MOTU_17##### LS_obt_L084 #####MOTU_18##### LS_muy_L907 Using k2p distance
GMYC.print_summary()
| Mean | Max | NN | DtoNN | |
|---|---|---|---|---|
| MOTU_01 | 0.143081 | 0.50167 | MOTU_04 | 2.84291 |
| MOTU_02 | 0.058574 | 0.16772 | MOTU_10 | 0.67024 |
| MOTU_03 | 0.000000 | 0.00000 | MOTU_05 | 1.55005 |
| MOTU_04 | 0.000000 | 0.00000 | MOTU_06 | 2.73593 |
| MOTU_05 | 0.136404 | 0.34101 | MOTU_03 | 1.55005 |
| MOTU_06 | 0.037100 | 0.16695 | MOTU_04 | 2.73593 |
| MOTU_07 | 0.000000 | 0.00000 | MOTU_12 | 0.67115 |
| MOTU_08 | 0.000000 | 0.00000 | MOTU_03 | 4.51779 |
| MOTU_09 | 0.000000 | 0.00000 | MOTU_01 | 7.68126 |
| MOTU_10 | 0.083475 | 0.16695 | MOTU_02 | 0.67024 |
| MOTU_11 | 0.000000 | 0.00000 | MOTU_16 | 3.98825 |
| MOTU_12 | 0.000000 | 0.00000 | MOTU_07 | 0.67115 |
| MOTU_13 | 0.000000 | 0.00000 | MOTU_08 | 6.33176 |
| MOTU_14 | 0.000000 | 0.00000 | MOTU_15 | 11.60407 |
| MOTU_15 | 0.000000 | 0.00000 | MOTU_01 | 6.77516 |
| MOTU_16 | 0.000000 | 0.00000 | MOTU_11 | 3.98825 |
| MOTU_17 | NaN | NaN | MOTU_10 | 2.90372 |
| MOTU_18 | NaN | NaN | NaN |
GMYC.plot_max_min()
GMYC.plot_freq()
GMYC.plot_heatmap()
Here you can provide a CSV file with several delimitation results. Note that CSV need to have a head row with the analyses names, that should be used then for print tables and figures.
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
csv_file= './data/Megaleporinus/BIN_list.csv'
basepath=os.path.dirname(fasta)
Inputs=SPdel.reading_data(fasta)
############################################################################ SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets ############################################################################ Sequences are aligned (same size) Fasta file with 116 sequences and 600 base pairs
csv_motus=SPdel.run_csvList(basepath,Inputs,csv_file)
##################### BIN MOTUs ##################### #####MOTU_AAB8569##### LS_piv_B060, LS_piv_B076, LS_piv_B078, LS_piv_B093, LS_piv_B094, LS_piv_B095, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099, LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144, LS_piv_L002, LS_piv_L003, LS_piv_L005, LS_piv_L006, LS_piv_L010, LS_piv_L011, LS_piv_L014, LS_piv_L284, LS_piv_L371 #####MOTU_AAB8578##### LS_obt_B074, LS_obt_B075, LS_obt_B077, LS_obt_B100, LS_obt_B101, LS_obt_B102, LS_obt_B103, LS_obt_L004, LS_obt_L007, LS_obt_L008, LS_obt_L009, LS_obt_L013, LS_obt_L016, LS_obt_L282, LS_obt_L283, LS_obt_L547, LS_obt_L548 #####MOTU_AAD1729##### LS_rei_B072, LS_rei_B073, LS_rei_L338, LS_rei_L339, LS_rei_L341, LS_rei_L342, LS_rei_L343, LS_rei_L353, LS_rei_L355, LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779 #####MOTU_AAE5328##### LS_mac_B082, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L212, LS_mac_L225, LS_mac_L290, LS_mac_L890, LS_mac_L891 #####MOTU_ABY2894##### LS_elo_L1041, LS_elo_L1046, LS_elo_L1047, LS_elo_L287, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L309 #####MOTU_ABZ0928##### LS_obt_B031, LS_obt_B070, LS_obt_B071, LS_obt_B090, LS_obt_L253, LS_obt_L266, LS_obt_L267, LS_obt_L268, LS_obt_L269, LS_obt_L314, LS_obt_L315, LS_obt_L316, LS_obt_L320 #####MOTU_ACL3073##### LS_tri_L519, LS_tri_L618, LS_tri_L621, LS_tri_L690, LS_tri_L955 #####MOTU_ACL3074##### LS_tri_L179, LS_tri_L180, LS_tri_L182 #####MOTU_ACL3227##### LS_gar_L293, LS_gar_L294, LS_gar_L295, LS_gar_L296, LS_gar_L298 #####MOTU_ACL3731##### LS_con_L210, LS_con_L211 #####MOTU_ACL3942##### LS_obt_L084 #####MOTU_ACL4264##### LS_con_L286, LS_con_L291, LS_con_L292, LS_con_L820 #####MOTU_ACO1303##### LS_mac_B061, LS_mac_B086, LS_mac_B087, LS_mac_B088, LS_mac_B089 #####MOTU_ADB0463##### LS_brn_L930, LS_brn_L931, LS_brn_L932 #####MOTU_ADB0512##### LS_muy_L907 #####MOTU_ADB0701##### LS_muy_L913, LS_muy_L914, LS_muy_L915 Using k2p distance
Note here the use of the analysis name ('BIN') to call the function
csv_motus['BIN'].print_summary()
| Mean | Max | NN | DtoNN | |
|---|---|---|---|---|
| MOTU_AAB8569 | 0.266286 | 1.00758 | MOTU_ACL3942 | 2.90372 |
| MOTU_AAB8578 | 0.143081 | 0.50167 | MOTU_ABZ0928 | 2.84291 |
| MOTU_AAD1729 | 0.316828 | 0.70177 | MOTU_ACL4264 | 6.14484 |
| MOTU_AAE5328 | 0.000000 | 0.00000 | MOTU_ACO1303 | 1.55005 |
| MOTU_ABY2894 | 0.037100 | 0.16695 | MOTU_ABZ0928 | 2.73593 |
| MOTU_ABZ0928 | 0.000000 | 0.00000 | MOTU_ABY2894 | 2.73593 |
| MOTU_ACL3073 | 0.000000 | 0.00000 | MOTU_AAE5328 | 4.51779 |
| MOTU_ACL3074 | 0.000000 | 0.00000 | MOTU_ACL3073 | 6.33176 |
| MOTU_ACL3227 | 0.000000 | 0.00000 | MOTU_AAB8578 | 7.68126 |
| MOTU_ACL3731 | 0.000000 | 0.00000 | MOTU_ACL4264 | 3.98825 |
| MOTU_ACL3942 | NaN | NaN | MOTU_AAB8569 | 2.90372 |
| MOTU_ACL4264 | 0.000000 | 0.00000 | MOTU_ACL3731 | 3.98825 |
| MOTU_ACO1303 | 0.136404 | 0.34101 | MOTU_AAE5328 | 1.55005 |
| MOTU_ADB0463 | 0.000000 | 0.00000 | MOTU_AAB8578 | 6.77516 |
| MOTU_ADB0512 | NaN | NaN | MOTU_ACL3073 | 7.47916 |
| MOTU_ADB0701 | 0.000000 | 0.00000 | MOTU_ADB0463 | 11.60407 |
csv_motus['BIN'].plot_max_min()
csv_motus['BIN'].plot_freq()
csv_motus['BIN'].plot_heatmap()
One of the main function of SPdel is to compare the diferent delimitation methods and produce Consensus MOTUS. We encourage researcher to include different methods, with different philosophies to strengthen the analysis. Also, your can obtain the same graphics as the other methods and aditionally a tree figure including all methods compared.
fasta = './data/Megaleporinus/Megaleporinus_COI.fasta'
tree = './data/Megaleporinus/Megaleporinus_tree.nwk'
basepath=os.path.dirname(fasta)
Compare_list = 'All'
Inputs=SPdel.reading_data(fasta,tree)
############################################################################ SPdel v2.0 - Species delimitation and statistics for DNA Barcoding data sets ############################################################################ Sequences are aligned (same size) Fasta file with 116 sequences and 600 base pairs
Note that whe Consensus MOTU are classified according their congruence between methods and the nominal taxonomy
compared=SPdel.run_comparison(basepath,Inputs,Compare_list)
##################### Consensus MOTUs ##################### ### All MOTUs match with the taxonomy ### Consensus MOTU 1 [LS_brn_(Nominal)&MOTU_ADB0463_(BIN)&MOTU_03_(bPTP)&MOTU_15_(GMYC)&MOTU_03_(PTP)] LS_brn_L930, LS_brn_L931, LS_brn_L932 Consensus MOTU 2 [LS_elo_(Nominal)&MOTU_ABY2894_(BIN)&MOTU_10_(bPTP)&MOTU_06_(GMYC)&MOTU_10_(PTP)] LS_elo_L1041, LS_elo_L1046, LS_elo_L1047, LS_elo_L287, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L309 Consensus MOTU 3 [LS_gar_(Nominal)&MOTU_ACL3227_(BIN)&MOTU_04_(bPTP)&MOTU_09_(GMYC)&MOTU_04_(PTP)] LS_gar_L293, LS_gar_L294, LS_gar_L295, LS_gar_L296, LS_gar_L298 ### Most MOTUs agree with the taxonomy ### ### All MOTUs do not match the taxonomy### Consensus MOTU 4 [MOTU_AAB8578_(BIN)&MOTU_09_(bPTP)&MOTU_01_(GMYC)&MOTU_09_(PTP)] LS_obt_B074, LS_obt_B075, LS_obt_B077, LS_obt_B100, LS_obt_B101, LS_obt_B102, LS_obt_B103, LS_obt_L004, LS_obt_L007, LS_obt_L008, LS_obt_L009, LS_obt_L013, LS_obt_L016, LS_obt_L282, LS_obt_L283, LS_obt_L547, LS_obt_L548 Consensus MOTU 5 [MOTU_AAE5328_(BIN)&MOTU_14_(bPTP)&MOTU_03_(GMYC)&MOTU_14_(PTP)] LS_mac_B082, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L212, LS_mac_L225, LS_mac_L290, LS_mac_L890, LS_mac_L891 Consensus MOTU 6 [MOTU_ABZ0928_(BIN)&MOTU_11_(bPTP)&MOTU_04_(GMYC)&MOTU_11_(PTP)] LS_obt_B031, LS_obt_B070, LS_obt_B071, LS_obt_B090, LS_obt_L253, LS_obt_L266, LS_obt_L267, LS_obt_L268, LS_obt_L269, LS_obt_L314, LS_obt_L315, LS_obt_L316, LS_obt_L320 Consensus MOTU 7 [MOTU_ACL3073_(BIN)&MOTU_08_(bPTP)&MOTU_08_(GMYC)&MOTU_08_(PTP)] LS_tri_L519, LS_tri_L618, LS_tri_L621, LS_tri_L690, LS_tri_L955 Consensus MOTU 8 [MOTU_ACL3074_(BIN)&MOTU_05_(bPTP)&MOTU_13_(GMYC)&MOTU_05_(PTP)] LS_tri_L179, LS_tri_L180, LS_tri_L182 Consensus MOTU 9 [MOTU_ACL3731_(BIN)&MOTU_07_(bPTP)&MOTU_16_(GMYC)&MOTU_07_(PTP)] LS_con_L210, LS_con_L211 Consensus MOTU 10 [MOTU_ACL3942_(BIN)&MOTU_12_(bPTP)&MOTU_17_(GMYC)&MOTU_12_(PTP)] LS_obt_L084 Consensus MOTU 11 [MOTU_ACL4264_(BIN)&MOTU_06_(bPTP)&MOTU_11_(GMYC)&MOTU_06_(PTP)] LS_con_L286, LS_con_L291, LS_con_L292, LS_con_L820 Consensus MOTU 12 [MOTU_ACO1303_(BIN)&MOTU_13_(bPTP)&MOTU_05_(GMYC)&MOTU_13_(PTP)] LS_mac_B061, LS_mac_B086, LS_mac_B087, LS_mac_B088, LS_mac_B089 Consensus MOTU 13 [MOTU_ADB0512_(BIN)&MOTU_02_(bPTP)&MOTU_18_(GMYC)&MOTU_02_(PTP)] LS_muy_L907 Consensus MOTU 14 [MOTU_ADB0701_(BIN)&MOTU_01_(bPTP)&MOTU_14_(GMYC)&MOTU_01_(PTP)] LS_muy_L913, LS_muy_L914, LS_muy_L915 ### Most MOTUs do not match the taxonomy ### Consensus MOTU 15 [MOTU_15_(bPTP)&MOTU_12_(GMYC)&MOTU_15_(PTP)] LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779 Consensus MOTU 16 [MOTU_16_(bPTP)&MOTU_07_(GMYC)&MOTU_16_(PTP)] LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353 Consensus MOTU 17 [MOTU_17_(bPTP)&MOTU_10_(GMYC)&MOTU_17_(PTP)] LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144 Consensus MOTU 18 [MOTU_18_(bPTP)&MOTU_02_(GMYC)&MOTU_18_(PTP)] LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099 Using k2p distance
SPdel.plot_compare_tree(basepath, Inputs.tree, compared[0])
##### Warning: Nominal species not contigous in the tree. #####
Saving tree plot as svg
SPdel.plot_compare_tree(basepath, Inputs.tree, compared[0],save=True)
##### Warning: Nominal species not contigous in the tree. #####
compared[1].print_summary()
| Mean | Max | NN | DtoNN | |
|---|---|---|---|---|
| MOTU_1 | 0.000000 | 0.00000 | MOTU_4 | 6.77516 |
| MOTU_10 | NaN | NaN | MOTU_17 | 2.90372 |
| MOTU_11 | 0.000000 | 0.00000 | MOTU_9 | 3.98825 |
| MOTU_12 | 0.136404 | 0.34101 | MOTU_5 | 1.55005 |
| MOTU_13 | NaN | NaN | MOTU_7 | 7.47916 |
| MOTU_14 | 0.000000 | 0.00000 | MOTU_1 | 11.60407 |
| MOTU_15 | 0.000000 | 0.00000 | MOTU_16 | 0.67115 |
| MOTU_16 | 0.000000 | 0.00000 | MOTU_15 | 0.67115 |
| MOTU_17 | 0.083475 | 0.16695 | MOTU_18 | 0.67024 |
| MOTU_18 | 0.058574 | 0.16772 | MOTU_17 | 0.67024 |
| MOTU_2 | 0.037100 | 0.16695 | MOTU_6 | 2.73593 |
| MOTU_3 | 0.000000 | 0.00000 | MOTU_4 | 7.68126 |
| MOTU_4 | 0.143081 | 0.50167 | MOTU_6 | 2.84291 |
| MOTU_5 | 0.000000 | 0.00000 | MOTU_12 | 1.55005 |
| MOTU_6 | 0.000000 | 0.00000 | MOTU_2 | 2.73593 |
| MOTU_7 | 0.000000 | 0.00000 | MOTU_5 | 4.51779 |
| MOTU_8 | 0.000000 | 0.00000 | MOTU_7 | 6.33176 |
| MOTU_9 | 0.000000 | 0.00000 | MOTU_11 | 3.98825 |
compared[1].plot_max_min()
compared[1].plot_freq()
compared[1].plot_heatmap()
compared[1].plot_heatmap(upper=3)
Note that in the previous example we use 'All' to indicate all methods previously calculated (and with results in the working folder), however we can indicate the analyses to be included in the comparision
Compare_list = 'Nominal,BIN,bPTP,PTP'
compared=SPdel.run_comparison(basepath,Inputs,Compare_list)
##################### Consensus MOTUs ##################### ### All MOTUs match with the taxonomy ### Consensus MOTU 1 [LS_brn_(Nominal)&MOTU_ADB0463_(BIN)&MOTU_03_(bPTP)&MOTU_03_(PTP)] LS_brn_L930, LS_brn_L931, LS_brn_L932 Consensus MOTU 2 [LS_elo_(Nominal)&MOTU_ABY2894_(BIN)&MOTU_10_(bPTP)&MOTU_10_(PTP)] LS_elo_L1041, LS_elo_L1046, LS_elo_L1047, LS_elo_L287, LS_elo_L300, LS_elo_L305, LS_elo_L307, LS_elo_L308, LS_elo_L309 Consensus MOTU 3 [LS_gar_(Nominal)&MOTU_ACL3227_(BIN)&MOTU_04_(bPTP)&MOTU_04_(PTP)] LS_gar_L293, LS_gar_L294, LS_gar_L295, LS_gar_L296, LS_gar_L298 ### Most MOTUs agree with the taxonomy ### ### All MOTUs do not match the taxonomy### Consensus MOTU 4 [MOTU_AAB8578_(BIN)&MOTU_09_(bPTP)&MOTU_09_(PTP)] LS_obt_B074, LS_obt_B075, LS_obt_B077, LS_obt_B100, LS_obt_B101, LS_obt_B102, LS_obt_B103, LS_obt_L004, LS_obt_L007, LS_obt_L008, LS_obt_L009, LS_obt_L013, LS_obt_L016, LS_obt_L282, LS_obt_L283, LS_obt_L547, LS_obt_L548 Consensus MOTU 5 [MOTU_AAE5328_(BIN)&MOTU_14_(bPTP)&MOTU_14_(PTP)] LS_mac_B082, LS_mac_L083, LS_mac_L085, LS_mac_L178, LS_mac_L212, LS_mac_L225, LS_mac_L290, LS_mac_L890, LS_mac_L891 Consensus MOTU 6 [MOTU_ABZ0928_(BIN)&MOTU_11_(bPTP)&MOTU_11_(PTP)] LS_obt_B031, LS_obt_B070, LS_obt_B071, LS_obt_B090, LS_obt_L253, LS_obt_L266, LS_obt_L267, LS_obt_L268, LS_obt_L269, LS_obt_L314, LS_obt_L315, LS_obt_L316, LS_obt_L320 Consensus MOTU 7 [MOTU_ACL3073_(BIN)&MOTU_08_(bPTP)&MOTU_08_(PTP)] LS_tri_L519, LS_tri_L618, LS_tri_L621, LS_tri_L690, LS_tri_L955 Consensus MOTU 8 [MOTU_ACL3074_(BIN)&MOTU_05_(bPTP)&MOTU_05_(PTP)] LS_tri_L179, LS_tri_L180, LS_tri_L182 Consensus MOTU 9 [MOTU_ACL3731_(BIN)&MOTU_07_(bPTP)&MOTU_07_(PTP)] LS_con_L210, LS_con_L211 Consensus MOTU 10 [MOTU_ACL3942_(BIN)&MOTU_12_(bPTP)&MOTU_12_(PTP)] LS_obt_L084 Consensus MOTU 11 [MOTU_ACL4264_(BIN)&MOTU_06_(bPTP)&MOTU_06_(PTP)] LS_con_L286, LS_con_L291, LS_con_L292, LS_con_L820 Consensus MOTU 12 [MOTU_ACO1303_(BIN)&MOTU_13_(bPTP)&MOTU_13_(PTP)] LS_mac_B061, LS_mac_B086, LS_mac_B087, LS_mac_B088, LS_mac_B089 Consensus MOTU 13 [MOTU_ADB0512_(BIN)&MOTU_02_(bPTP)&MOTU_02_(PTP)] LS_muy_L907 Consensus MOTU 14 [MOTU_ADB0701_(BIN)&MOTU_01_(bPTP)&MOTU_01_(PTP)] LS_muy_L913, LS_muy_L914, LS_muy_L915 ### Most MOTUs do not match the taxonomy ### Consensus MOTU 15 [MOTU_15_(bPTP)&MOTU_15_(PTP)] LS_rei_L773, LS_rei_L777, LS_rei_L778, LS_rei_L779 Consensus MOTU 16 [MOTU_16_(bPTP)&MOTU_16_(PTP)] LS_rei_B072, LS_rei_L342, LS_rei_B073, LS_rei_L341, LS_rei_L338, LS_rei_L355, LS_rei_L339, LS_rei_L343, LS_rei_L353 Consensus MOTU 17 [MOTU_17_(bPTP)&MOTU_17_(PTP)] LS_piv_B140, LS_piv_B141, LS_piv_B142, LS_piv_B144 Consensus MOTU 18 [MOTU_18_(bPTP)&MOTU_18_(PTP)] LS_piv_B060, LS_piv_B095, LS_piv_B093, LS_piv_L002, LS_piv_L003, LS_piv_L011, LS_piv_L010, LS_piv_B076, LS_piv_B094, LS_piv_L284, LS_piv_L371, LS_piv_B078, LS_piv_L005, LS_piv_L006, LS_piv_L014, LS_piv_B096, LS_piv_B097, LS_piv_B098, LS_piv_B099 Using k2p distance
Don't plotting the consensus bar
SPdel.plot_compare_tree(basepath, Inputs.tree, compared[0],nocons=True)
##### Warning: Nominal species not contigous in the tree. #####
SPdel allows you to calculate diagnostic characters in your dataset, following BOLD classification https://v3.boldsystems.org/index.php/resources/handbook?chapter=5_analysis_extended.html§ion=diagnostic_characters.
diagnostic=SPdel.diagnostics(basepath,'Nominal')
diagnostic.summary
| N | Diagnostic | Diag. or Partial | Partial | Part. or Uninformative | Invalid | |
|---|---|---|---|---|---|---|
| LS_brn | 3 | 6 | 1 | 8 | 0 | 0 |
| LS_con | 6 | 1 | 0 | 1 | 1 | 0 |
| LS_elo | 9 | 3 | 0 | 4 | 1 | 0 |
| LS_gar | 5 | 8 | 3 | 4 | 2 | 0 |
| LS_mac | 14 | 3 | 0 | 6 | 2 | 0 |
| LS_muy | 4 | 0 | 0 | 0 | 1 | 0 |
| LS_obt | 31 | 0 | 0 | 0 | 0 | 0 |
| LS_piv | 23 | 0 | 0 | 9 | 1 | 0 |
| LS_rei | 13 | 3 | 0 | 3 | 0 | 0 |
| LS_tri | 8 | 1 | 0 | 4 | 1 | 0 |
diagnostic.plot_legend()
diagnostic.plot_dcs()
However the Diagnostic character analysis has only sense when you compare two or few related species, and not all your dataset (with more species there is more difficult to find diagnostic characters). Remeber that the aim of this analysis is to differentiate two or few related species that have low genetic diversity. So, we can specify the species or MOTUs to be includede as follow:
diagnostic=SPdel.diagnostics(basepath,'Nominal',['LS_obt','LS_piv'])
diagnostic.summary
| N | Diagnostic | Diag. or Partial | Partial | Part. or Uninformative | Invalid | |
|---|---|---|---|---|---|---|
| LS_obt | 31 | 3 | 0 | 4 | 0 | 0 |
| LS_piv | 23 | 3 | 0 | 43 | 19 | 0 |
diagnostic.plot_dcs()
Also, as default the SPdel only included groups with at least 3 individual, you can change that!
diagnostic=SPdel.diagnostics(basepath,'Nominal',n_ind=8)
diagnostic.summary
| N | Diagnostic | Diag. or Partial | Partial | Part. or Uninformative | Invalid | |
|---|---|---|---|---|---|---|
| LS_elo | 9 | 4 | 0 | 5 | 1 | 0 |
| LS_mac | 14 | 7 | 0 | 7 | 0 | 0 |
| LS_obt | 31 | 0 | 0 | 0 | 0 | 0 |
| LS_piv | 23 | 0 | 0 | 10 | 1 | 0 |
| LS_rei | 13 | 7 | 0 | 5 | 0 | 0 |
| LS_tri | 8 | 3 | 0 | 5 | 0 | 0 |
diagnostic.plot_dcs()
Finally, Remeber that you can use the Diagnostic character analysis with any analysis previously calculated
diagnostic=SPdel.diagnostics(basepath,'Consensus',['MOTU_17','MOTU_18'])
diagnostic.summary
| N | Diagnostic | Diag. or Partial | Partial | Part. or Uninformative | Invalid | |
|---|---|---|---|---|---|---|
| MOTU_18 | 19 | 4 | 0 | 1 | 0 | 0 |
| MOTU_17 | 4 | 4 | 0 | 1 | 3 | 0 |
diagnostic.plot_dcs()
Thanks for use SPdel!